Python classes may optionally define a long list of methods that, when defined, are called when the instances of the class are used in certain situations. For example, a class may define under what situations its instances should be considered equivalent by defining a method called __eq__. If the __eq__ method is defined, it is invoked if the class meets an equality test using the == operator.
The purpose of these so-called “magic methods” is to overload Python operators or built-in methods. They are defined using the _ syntax to avoid a case where a programmer accidentally defines a method with the same name without explicitly opting in to the functionality. Magic methods provide consistency between the contracts that built-in classes (including primitives such as integers and strings) provide, as well as the contracts that custom classes provide. If you want to test for equivalence in Python, you should always be able to use == to do so, regardless of whether you are testing two integers, two instances of a class that you wrote for your specific application, or even two instances of unrelated classes.
This chapter explores magic methods, how they work, and what magic methods are available.
In Python, magic methods follow a consistent pattern—the name of the method is wrapped on both sides by two underscores. For example, when an instance of a class is instantiated, the method that runs is __init__ (not init).
This convention exists to provide a certain level of future-proofing. You can name methods as you please, and not have to worry that your method name will later be used by Python to assign some special (and unintended) significance, provided that you do not name your methods such that they both begin and end with two underscores.
When verbally referring to such methods (for example, in talks at conferences), many people choose to use the coined term “dunder” to describe them. So, __init__ ends up being pronounced as dunder-init.
Each magic method serves a specific purpose; it is a hook that is run when particular syntax appears. For example, the __init__ method is run when a new instance of a class is created. Consider the following simple class:
class MyClass(object):
def __init__(self):
print('The __init__ method is running.')
Of course, this class does nothing, except for print to standard out upon instantiation. That is enough to establish that the __init__ method runs in this situation, though.
>>> mc = MyClass()
The __init__ method is running.
>>>
What is important to realize here is that you are not actually calling the __init__ method directly. Rather, the Python interpreter simply knows to call __init__ upon object instantiation.
Each of the magic methods works this way. There is a particular spelling and method signature that is taken (sometimes the method signature is variable), and the method is actually invoked in a particular situation.
The __eq__ method (mentioned earlier) takes both the obligatory self argument and a second positional argument, which is the object being compared against.
class MyClass(object):
def __eq__(self, other):
# All instances of MyClass are equivalent to one another, and they
# are not equivalent to instances of other classes.
return type(self) == type(other)
Notice that this __eq__ method takes a second argument, other. Because the __eq__ method runs when Python is asked to make an equivalence check with the == operator, other will be set to the object on the other side of ==.
This example __eq__ method simply decides equality based solely on whether it is another instance of MyClass. Therefore, you get the following results:
>>> MyClass() == MyClass()
True
>>> MyClass() == 42
False
Two different instances of MyClass are equivalent because isinstance(other, type(self)) evaluates to True. On the other hand, 42 is an int, and, therefore, not an instance of MyClass. Thus, __eq__ (and, therefore, the == operator) returns False.
The Python interpreter understands a rich set of magic methods that serve many different purposes, from comparison checks and sorting, to hooks for various language features. This book has already explored some of these in Chapter 2, “Context Managers,” and Chapter 3, “Generators.”
These methods are run when instances of the class are created or destroyed.
The __init__ method of an object runs immediately after the instance is created. It must take one positional argument (self) and then can take any number of required or optional positional arguments, and any number of keyword arguments.
This method signature is flexible because the arguments passed to the class instantiation call are what are sent to __init__.
Consider the following class with an __init__ method that takes an optional keyword argument:
import random
class Dice(object):
"""A class representing a dice with an arbitrary number
of sides.
"""
def __init__(self, sides=6):
self._sides = sides
def roll(self):
return random.randint(1, self._sides)
To instantiate a standard, six-sided die, you need only call the class with no arguments: die = Dice(). This creates the Dice instance (more on that later), and then calls the new instance's __init__ method, passing no arguments except self. Because the sides argument is not provided, the default of 6 is used.
To instead create a d20, however, you simply pass the sides argument to the call to Dice, which forwards it to the __init__ function.
>>> die = Dice(sides=20)
>>> die._sides
20
>>> die.roll()
20
>>> die.roll()
18
It is worth noting that the purpose of the __init__ method is not to actually create the new object (that is performed by __new__). Rather, the purpose is to provide initial data to the object after it has been created.
What this means in practice is that the __init__ method does not (and should not) actually return anything. All __init__ methods in Python return None, and returning anything else will raise TypeError.
The __init__ method is probably the single most common magic method that custom classes define. Most classes are instantiated with extra variables that customize their implementation in some way, and the __init__ method is the appropriate place for this behavior.
The __new__ method actually precedes the __init__ method in the dance of creating an instance of a class. Whereas the __init__ method is responsible for customizing an instance once it has been created, the __new__ method is responsible for actually creating and returning that instance.
The __new__ method is always static. It does not need to be explicitly decorated as such. The first and most important argument is the class of which an instance is being created (by convention, called cls).
In most cases, the remaining arguments to __new__ should mirror the arguments to __init__. The arguments sent to the call to the class will be sent first to __new__ (because it is called first), and then to __init__.
Realistically, most classes do not actually need to define __new__ at all. The built-in implementation is adequate. When classes do need to define __new__, they will almost always want to reference the superclass implementation first, as shown here, before doing whatever work is necessary on the instance:
class MyClass(object):
def __new__(cls, [...]):
instance = super(MyClass, cls).__new__(cls, [...])
[do work on instance]
return instance
Normally, you will want the __new__ method to return an instance of the class being instantiated. However, occasionally this may not be true. Note, however, that the __init__ half of the dance will only be performed if you return an instance of the class whose __new__ method is being run. If you return something else, the instance's __init__ method will not be invoked.
You do this primarily because, in situations where an instance of a different class is returned, the __init__ method was likely run by whatever means created that instance within the __new__ method, and running it twice would be problematic.
Whereas the __new__ and __init__ methods are invoked when an object is being created, the __del__ method is invoked when an object is being destroyed.
It is relatively rare for developers to destroy their objects in Python directly. (You should do so with the del keyword if you need to.) Python's memory management is good enough that it is generally acceptable simply to allow the garbage collector to do so.
That said, the __del__ method is run regardless of how an object comes to be destroyed, whether it is through a direct deletion, or through memory reclamation by the garbage collector. You can see this behavior at work by making the following class that deletes noisily:
class Xon(object):
def __del__(self):
print('AUUUUUUGGGGGGHH!')
If you make Xon objects but do not assign them to variables, they will be marked as collectable by the garbage collector, which will collect them in short order as other program statements run.
>>> Xon()
<__main__.Xon object at 0x1022b8890>
>>> 'foo'
AUUUUUUGGGGGGHH!
'foo'
>>>
What happened here? First, an Xon object was created (but not assigned to a variable, so there is no real reason for the Python interpreter to keep it around). Next, the interpreter was sent an immutable string, which it must assign to memory (and then immediately release, because it was not assigned to a variable either, but that is not important).
In the particular interpreters I was using (CPython 3.4.0 and CPython 2.7.6), that memory operation causes the garbage collector to take a pass through its table. It finds the Xon object and deletes it. This triggers the Xon object's __del__ method, which then loudly screams as it is unceremoniously sent to the great bit bucket beyond.
You see similar (but more immediate) behavior if you delete an Xon object directly, as shown here:
>>> x = Xon()
>>> del x
AUUUUUUGGGGGGHH!
In both cases, the principle is the same. No matter whether the deletion is directly invoked in code or automatically triggered by the garbage collector, the __del__ method is invoked identically.
It is worth noting that __del__ methods are generally unable to raise exceptions in any meaningful way. Because deletions are usually triggered in the background by the garbage collector, there is no good way for exceptions to bubble. Therefore, raising any kind of exception in a __del__ method just prints some nastiness to standard error, and it is generally considered inappropriate to raise exceptions there.
Several magic methods are available in Python to take a complex object and make it into a more primitive, or more widely used type. For example, types such as int, str, and bool are used everywhere in Python, and it is useful for complex objects to know what their representations are in these formats.
By far, the most commonly used type conversion magic method is __str__. This method takes one positional argument (self), is invoked when an object is passed to the str constructor, and is expected to return a string.
>>> class MyObject(object):
... def __str__(self):
... return 'My Awesome Object!'
...
>>> str(MyObject())
'My Awesome Object!'
Because strings are so ubiquitous, it is very often useful for classes to define a __str__ method.
There is a bit more to this situation, however. In Python 2, strings are ASCII strings, whereas in Python 3, strings are Unicode strings. This actually causes a great deal of pain, and this book devotes an entire chapter to the subject (Chapter 8, “Strings and Bytestrings”).
Suffice it to say here, however, that Python 2 does have Unicode strings, and Python 3 introduces a type called bytes (or bytestrings, as they are sometimes called), which are roughly analogous to the old Python 2 ASCII strings.
These string brethren have their own magic methods. Python 2 honors a __unicode__ method that is invoked when an object is passed to the unicode constructor. Similarly, Python 3 honors a __bytes__ method that is invoked when an object is passed to the bytes constructor. In both cases, the method is expected to return the proper type.
The __str__ method is invoked in certain other situations, too (essentially, situations where str is called under the hood). For example, encountering %s in a format string will run the corresponding argument through str, as shown here:
>>> 'This is %s' % MyObject()
'This is My Awesome Object!'
In this case, however, the formatting method is a bit smarter. For example, if %s is encountered when formatting a unicode object in Python 2, it will attempt to use __unicode__ first. Consider the following code, running in Python 2.7:
>>> class Which(object):
... def __str__(self):
... return 'string'
... def __unicode__(self):
... return u'unicode'
...
>>> u'The %s was used.’ % Which()
u'The unicode conversion was performed.'
>>> 'The %s was used.' % Which()
'The string conversion was performed.'
Another common need is for an object to define whether it should be considered True or False, either if expressly converted to a Boolean, or in a situation where a Boolean representation is required (such as if the object is the subject of an if statement).
This is handled in Python 3 with the __bool__ magic method, which in Python 2 is instead called __nonzero__. In both cases, the method takes one positional argument (self) and returns either True or False.
It is often unnecessary to define an explicit __bool__ method. If no __bool__ method is defined but a __len__ method (explained further shortly) is defined, the latter will be used, and these often overlap.
Occasionally, it is valuable for complex objects to be able to convert to primitive numbers. If an object defines an __int__ method, which should return an int, it will be invoked if the object is passed to the int constructor.
Similarly, objects that define __float__ and __complex__ will have those methods invoked if they are passed to float and complex, respectively.
Objects are being compared when they are checked for equivalence (with == or !=), or for relative value to one another (such as with <, <=, >, and >=).
Each of these operators maps to a magic method in Python.
The following methods support testing equality using == and !=.
As already explored, the __eq__ method is called when two objects are compared with the == operator. The method must take two positional arguments (by convention, self and other), which are the two objects being compared.
Under most circumstances, the object on the left side has its __eq__ method checked first. It is used if it is defined (and returns something other than NotImplemented). Otherwise, the __eq__ method of the object on the right side is used instead (with the argument order reversed).
Consider the following class that is noisy when given equivalence tests (and then returns False unless it is the exact same object):
class MyClass(object):
def __eq__(self, other):
print('The following are being tested for equivalence:\n'
'%r\n%r' % (self, other))
return self is other
You can see the order in action based on which side of the operator your objects are on.
>>> c1 = MyClass()
>>> c2 = MyClass()
>>> c1 == c2
The following are being tested for equivalence:
<__main__.MyClass object at 0x1066de590>
<__main__.MyClass object at 0x1066de390>
False
>>> c2 == c1
The following are being tested for equivalence:
<__main__.MyClass object at 0x1066de390>
<__main__.MyClass object at 0x1066de590>
False
>>> c1 == c1
The following are being tested for equivalence:
<__main__.MyClass object at 0x1066de590>
<__main__.MyClass object at 0x1066de590>
True
Notice how the order in which the objects are dumped to standard out is reversed. This is because the order in which they were sent to __eq__ was reversed. This also means that there is no inherent requirement that your equivalence check be commutative. However, unless you have a really good reason, you should ensure that equivalence is consistently commutative.
You can observe another facet of this behavior by comparing a MyClass object against something of a different type. Consider the following type with a plain __eq__ method that does nothing but return False:
class Unequal(object):
def __eq__(self, other):
return False
And, when you run equivalence tests against instances of these classes, you see different behavior based on the order in which they are called. When an instance of MyClass is on the left, its __eq__ method is called. When an instance of Unequal is on the left, its quieter brethren is called instead.
>>> MyClass() == Unequal()
The following are being tested for equivalence:
<__main__.MyClass object at 0x1066de5d0>
<__main__.Unequal object at 0x1066de450>
False
>>> Unequal() == MyClass()
False
There is one exception to this rule on order of objects sent to __eq__: direct subclasses. If one of the two objects being compared is an instance of a direct subclass of the other, this will override the ordering rules, and the __eq__ method of the subclass will be used.
class MySubclass(MyClass):
def __eq__(self, other):
print('MySubclass\' __eq__ method is testing:\n'
'%r\n%r' % (self, other))
return False
Now, the same method with the same argument order will be invoked, regardless of the order in which arguments are provided to the operator.
>>> MyClass() == MySubclass()
MySubclass' __eq__ method is testing:
<__main__.MySubclass object at 0x1066de690>
<__main__.MyClass object at 0x1066de450>
False
>>> MySubclass() == MyClass()
MySubclass' __eq__ method is testing:
<__main__.MySubclass object at 0x1066de5d0>
<__main__.MyClass object at 0x1066de450>
False
The __ne__ method is the converse of the __eq__ method. It works the same way, except that it is invoked when the != operator is used.
Normally, it is not necessary to define an __ne__ method, provided that you always want the result to be the opposite of the returned value of __eq__. If no __ne__ method is defined, the Python interpreter will run the __eq__ method and flip the result.
It is possible to explicitly provide an __ne__ method for situations where you do not want this behavior.
These methods also handle comparison, but using comparison operators that test relative value (such as >).
The __lt__, __le__, __gt__, and __ge__ methods map to the <, <=, >, and >= operators, respectively. Like the equivalence methods, each of these methods should take two arguments (by convention, self and other), and return True if the relative comparison should be considered to hold, and False otherwise.
Usually, it is unnecessary to define all four of these methods. The Python interpreter will rightly consider __lt__ to be the inverse of __ge__, and __gt__ to be the inverse of __le__. Similarly, the Python interpreter will consider the __le__ method to be the disjunction of __lt__ and __eq__, and the __ge__ method to be the disjunction of __gt__ and __eq__.
This means that, in practice, it is usually only necessary to define __eq__ and __lt__ (or __gt__), and all six of the comparison operators will work in the way that you expect.
Another important (but easily overlooked) aspect of defining these methods is that they are what the built-in sorted function uses for sorting objects. Therefore, if you have a list of objects with these methods defined, passing that list to sorted automatically returns a sorted list, from least to greatest, based on the result of the comparison methods.
The __cmp__ method is an older (and less preferred) way of defining relative comparisons for objects. It is checked if (and only if) the comparison methods described previously are not defined.
This method takes two positional arguments (by convention, self and other), and should return a negative integer if self is less than other, or a positive integer if self is greater than other. If self and other are equivalent, the method should return 0.
The __cmp__ method is deprecated in Python 2, and not available in Python 3.
These methods provide a mechanism to override the standard Python operators.
A set of magic methods is also available for overloading the various binary operators available in Python, such as +, -, and so on. Python actually supplies three magic methods for each operator, each of which takes two positional arguments (by convention, self and other).
The first of these is a vanilla method, in which an expression x + y maps to x.__add__(y), and the method simply returns the result.
The second is a reverse method. The reverse methods are called (with the operands swapped) if (and only if) the first operand does not supply the traditional method (or returns NotImplemented) and the operands are of different types. These methods are spelled the same way, but the method name is preceded by an r. Therefore, the expression x + y, where x does not define an __add__ method, would call y.__radd__(x).
The third and final magic method is the in-place method. In-place methods are called when the operators that modify the former variable in place (such as +=, -=, and so on) are used. These are spelled the same way, but the method name is preceded by an i. Therefore, the expression x += y would call x.__iadd__(y).
Normally, the in-place methods simply modify self in place and return it. However, this is not a strict requirement. It is also worth noting that it is only necessary to define an in-place method if the behavior of the straightforward method does not cleanly map. The straightforward method is called and its return value assigned to the left operand in the event that the in-place method is not defined.
Table 4.1 shows the full set of operator overloading magic methods.
Table 4.1 Operator Overloading Magic Methods
| Operator | Method | Reverse | In-place |
+ |
__add__ |
__radd__ |
__iadd__ |
- |
__sub__ |
__rsub__ |
__isub__ |
* |
__mul__ |
__rmul__ |
__imul__ |
/ |
__truediv__ |
__rtruediv__ |
__itruediv__ |
// |
__floordiv__ |
__rfloordiv__ |
__ifloordiv__ |
% |
__mod__ |
__rmod__ |
__imod__ |
** |
__pow__ |
__rpow__ |
__ipow__ |
& |
__and__ |
__rand__ |
__iand__ |
| |
__or__ |
__ror__ |
__ior__ |
ˆ |
__xor__ |
__rxor__ |
__ixor__ |
<< |
__lshift__ |
__rlshift__ |
__ilshift__ |
>> |
__rshift__ |
__rrshift__ |
__irshift__ |
These methods allow for overloading of all of the binary operators that are available in Python. Custom classes can (and should) define them when it is sensible to do so.
One binary operator, division (/), requires slightly more discussion. First, you need a bit of background. Originally, in Python, the division operator between two integers would always return an int, not a float. Essentially, what happens is that the division is performed and the floor of the result is taken. Therefore, 5 / 2 would return 2, and -5 / 2 would return -3. If you wanted a float result, at least one of the operands had to be a float. Therefore, 5.0 / 2 would return 2.5.
Python 3 changes this behavior, because many developers found it to be counterintuitive. In Python 3, division between two integers returns a float, and does so even if the result is a whole number. Thus, 5 / 2 is 2.5, and 4 / 2 is 2.0 (not 2). This is one of the backward-incompatible changes that Python 3 introduced to the language.
Because Python 3 introduced backward-incompatible changes, subsequent releases of the Python 2 series used a mechanism already in place to “opt in” to the new behavior: a special module called __future__, from which future behavior can be imported. In Python 2.6 and 2.7, developers can opt-in to the Python 3 behavior by issuing from __future__ import division.
This is important to discuss here because it alters which magic method is used. The __truediv__ (and siblings) method in Table 5-1 is the Python 3 method. Python 2 originally provided __div__, and calls __div__ for the / operator unless division is imported from __future__, in which case it conforms to the Python 3 behavior and calls __truediv__.
In most cases, code that runs on Python 2 probably needs to be agnostic as to which division scheme is in effect. This means defining both the __div__ and __truediv__ methods. In most cases, it is probably completely acceptable to just map them to each other, as shown here:
class MyClass(object):
def __truediv__(self, other):
[...]
__div__ = __truediv__
It is probably wise to make __truediv__ be the “proper” method, and __div__ the alias. The broader principle here is that any code that may even eventually run on Python 3 should be written to target Python 3 and accommodate Python 2, as opposed to the other way around.
Python also provides three unary operators: +, -, and ∼. Notice that two of the symbols here are reused between unary and binary operators. This is fine. The interpreter is able to determine which is in use based on whether the expression is unary or binary.
The unary operator methods simply take a single positional argument (self), perform the operation, and return the result. The methods are called __pos__ (which maps to +), __neg__ (which maps to -), and __invert__ (which maps to ∼).
Unary operators are straightforward. The expression ∼x, for example, calls x.__invert__(). Consider the following string-like class that is able to return the string backward:
class ReversibleString(object):
def __init__(self, s):
self.s = s
def __invert__(self):
return self.s[::-1]
def __str__(self):
return self.s
And, in the Python interpreter, you would see the following:
>>> rs = ReversibleString('The quick brown fox jumped over the lazy dogs.')
>>> ∼rs
'.sgod yzal eht revo depmuj xof nworb kciuq ehT'
So, what is happening here? The ReversibleString object is created and assigned to rs. The second statement, ∼rs, is a simple unary expression. The result is not being assigned to a variable, which means that it is simply being discarded. The rs variable is not being modified in place. The interpreter, however, shows you the result, which is a str object that represents your string, backward.
Note that the return value is a str, not a ReversibleString. There is no obligation that these methods return a value of the same type as the operand, and your __invert__ method does not do so.
There is no reason why it cannot return a ReversibleString, however, and often returning an object of the same type is desirable.
class ReversibleString(object):
def __init__(self, s):
self.s = s
def __invert__(self):
return type(self)(self.s[::-1])
def __repr__(self):
return 'ReversibleString: %s' % self.s
def __str__(self):
return self.s
This iteration of ReversibleString returns a new ReversibleString instance from its __invert__ method. A custom repr has been added for demonstration purposes, because having the interpreter provide a memory address in the output is not useful.
The Python interpreter now shows slightly different output:
>>> rs = ReversibleString('The quick brown fox jumped over the lazy dogs.')
>>> ∼rs
ReversibleString: .sgod yzal eht revo depmuj xof nworb kciuq ehT
Instead of getting a str object back, you now have a ReversibleString. This means that your inverted output is now invertible.
>>> ∼∼rs
ReversibleString: The quick brown fox jumped over the lazy dogs.
This is straightforward. The rs object is having its __invert__ method called. Then, the result of that expression is having its __invert__ method called. This is, therefore, equivalent to rs.__invert__().__invert__().
Python includes many built-in methods (the most common example being len) that are widely used and almost as much of the contract that an object observes as are the operators. Therefore, Python supplies magic methods that are invoked when an object is passed to those methods.
The most common method to be overloaded in this way is almost certainly len, which is the Pythonic way to determine the “length” of an item. The length of a string is the number of characters in the string, the length of a list is the number of elements within the list, and so on.
Objects can describe their length by defining a __len__ method. This method takes one positional argument (self) and should return an integer.
Consider the following class to represent a span of time:
class Timespan(object):
def __init__(self, hours=0, minutes=0, seconds=0):
self.hours = hours
self.minutes = minutes
self.seconds = seconds
def __len__(self):
return (self.hours * 3600) + (self.minutes * 60) + self.seconds
This class essentially takes a number of hours, minutes, and seconds; it then calculates the seconds that this represents and uses that as the length.
>>> ts = Timespan(hours=2, minutes=30, seconds=1)
>>> len(ts)
9001
It is worth noting that the __len__ method, if defined, also is used to determine whether an object is considered True or False if typecast to a bool or is used in an if statement, unless the object also defines a __bool__ method (or, in Python 2, __nonzero__).
This will actually do exactly what you expect the bulk of the time, so it often is not necessary to define a separate __bool__.
>>> bool(Timespan(hours=1, minutes=0, seconds=0))
True
>>> bool(Timespan(hours=0, minutes=0, seconds=0))
False
In Python 3.4, an additional method, __length__hint_, has been added. Its purpose is to provide an estimate of an object's length, which is allowed to be somewhat greater than or less than an object's actual length, and can be used as a performance optimization. It takes one positional argument (self), and must return an integer greater than 0.
One of the most important built-in methods in Python is also potentially one of the most overlooked: repr. Any object can define a __repr__ method, which takes one positional argument (self).
Why is repr so important? An object's repr is how it will represent itself when output on the Python interactive terminal.
It is not generally useful to return an object in the terminal and have it render as <__main__.O object at 0x102cdf950>. In the vast majority of cases, an object's class and address in memory are not what you want to know.
Defining __repr__ allows you to give objects a more useful representation. Consider the following Timespan class with a useful __repr__ method:
class Timespan(object):
def __init__(self, hours=0, minutes=0, seconds=0):
self.hours = hours
self.minutes = minutes
self.seconds = seconds
def __repr__(self):
return 'Timespan(hours=%d, minutes=%d, seconds=%d)' % \
(self.hours, self.minutes, self.seconds)
What happens when you work with Timespan objects on the terminal now?
>>> Timespan()
Timespan(hours=0, minutes=0, seconds=0)
>>> Timespan(hours=2, minutes=30)
Timespan(hours=2, minutes=30, seconds=0)
This is much more useful than a memory address!
Notice that in addition to communicating all the key attributes of a Timespan, the repr prints as a valid expression that instantiates a Timespan. This is incredibly valuable when it is possible. It intuitively communicates that you are working with an object generally, and a Timespan object specifically. Just printing out the timing information might leave open the interpretation that you are looking at a str or a timedelta, for example. Also, the Python interpreter could parse it if it's copied and pasted. That is a good thing.
What this really points to is a more general distinction that is important: repr and str have different purposes. Exactly how you delineate them is a matter of subtle differences of opinion, depending on what you read. But an all-encompassing understanding should be that an object's repr is intended for programmers (and machines, possibly), whereas an object's str is geared toward more public consumption. You would not want the Timespan's str to look like a class instantiation call. Most likely, it would be something intended for humans instead.
It is often very useful for an object's repr to return a valid Python expression to reconstruct the object. Many Python built-ins do this. The repr of an empty list is [], which is the expression to make an empty list.
When this is impossible or impractical, a good rule of thumb is to return something that looks like it is obviously an object, and is noisy about what its key properties are. As an example, an alternative repr for a Timestamp object might be <Timestamp: X hours, Y minutes, Z seconds>. The Python interpreter will not be able to parse that (unlike the repr used previously), but it is clear exactly what it is, and nobody will errantly expect it to be able to be parsed, either.
Another often overlooked built-in function is the hash function. The purpose of the hash function is to uniquely identify objects, and to do so using a numeric representation.
When an object is passed to hash, its __hash__ method is invoked (if defined). The __hash__ method takes one positional argument (self), and should return an integer. It is acceptable for this integer to be negative.
The object class provides a __hash__ function, which normally simply returns the id of the object. An object's id is implementation-specific, but in CPython, it is its memory address.
However, if an object defines an __eq__ method, the __hash__ method is implicitly set to None. This is done because of an ambiguity in the purpose of hashing generally. Depending on how they are being used, it may be desirable for every object to have a unique hash, or for equivalent objects to have matching hashes. And, “in the face of ambiguity, avoid the temptation to guess.”
Therefore, if a class should understand equivalence and be hashable, it must explicitly define its own __hash__ method.
Hashes are used in several places in the Python ecosystem. The two most common uses for them are for dictionary keys and in set objects. Only hashable objects can be used as dictionary keys. Similarly, only hashable objects can exist in Python set objects. In both cases, the hash is used to determine equivalence for testing set membership and dictionary key lookup.
Another common Python built-in function is the format function, which is capable of formatting various kinds of objects according to Python's format specification.
Any object can provide a __format__ method, which is invoked if an object is passed to format. This method takes two positional arguments, the first being self, and the second being the format specification string.
In Python 3, the str.format method has replaced the % operator as the preferred way to handle templating within strings. If you pass an object with a __format__ method as an argument to str.format, this method will be called.
>>> from datetime import datetime
>>>
>>>
>>> class MyDate(datetime):
... def __format__(self, spec_str):
... if not spec_str:
... spec_str = '%Y-%m-%d %H:%M:%S'
... return self.strftime(spec_str)
...
>>>
>>> md = MyDate(2012, 4, 21, 11)
>>>
>>> '{0}'.format(md)
'2012-04-21 11:00:00'
Because the string used {0} with no additional formatting information, there was no format specification, and the default is used. However, note what happens when you provide one:
>>> '{0:%Y-%m-%d}'.format(md)
'2012-04-21'
The __format__ method is only called in this way when using the format method. It is not called if %-substitution is used within a string.
Although most type checking in Python is done using so-called duck typing (if obj.look()-s like a Duck and obj.quack()-s like a Duck, it's probably a Duck), it is also possible to test whether an object is an instance of a particular class using the built-in isinstance method. Similarly, a class can test whether it inherits from another class using issubclass.
It is rarely necessary to customize this behavior. The isinstance method returns True if the object is an instance of the provided class or any subclass thereof (which is almost always what you want). Similarly, issubclass (despite its name) returns True if the same class is provided for both arguments (which is also almost always what you want).
Occasionally, though, it is desirable to allow classes to fake their identities. Python 2.6 introduces this possibility by providing the __instancecheck__ and __subclasscheck__ methods. Each of these methods takes two arguments, the first being self, and the second being the object being tested against this class (so, the first argument to isinstance). This allows classes to determine what objects may masquerade as their instances or subclasses.
Python provides built-in abs and round functions, which return the absolute value of a number and a rounded value, respectively.
Although it is not usually necessary for custom classes to define this behavior, they can do so by defining __abs__ and __round__, respectively. Both take one positional argument (self), and should return a numeric value.
Many objects are collections of various kinds of other objects. Most complex classes functionally come down to a collection of attributes (sorted in a meaningful way), as well as actions that the object can take.
Python has several ways of understanding “membership” of one object within another. For lists and dictionaries, for example, it is possible to test whether an object is a member of the collection by the expression needle in haystack (where needle is the variable being searched for, and haystack is the collection).
Dictionaries are made up of keys, and can perform lookup based on the key by evaluating haystack[key]. Similarly, most objects have attributes that are set during initialization or by other methods, which are accessed using dot notation (haystack.attr_name).
Python has magic methods that interact with all of these.
The __contains__ method is invoked when an expression such as needle in haystack is evaluated. This method takes two positional arguments (self, and then the needle), and should return True if the needle is considered to be present, and False if it is absent.
There is no strict requirement that this conform to object presence within another object, although that is the most common use case. Consider the following class that represents a range of dates:
class DateRange(object):
def __init__(self, start, end):
self.start = start
self.end = end
def __contains__(self, needle):
return self.start <= needle <= self.end
In this case, the __contains__ method determines whether the date is between the boundaries of the range.
>>> dr = DateRange(date(2015, 1, 1), date(2015, 12, 31))
>>> date(2015, 4, 21) in dr
True
>>> date(2012, 4, 21) in dr
False
The __getitem__ method and its siblings are used for key lookups on collections (such as dictionaries), or index or slice lookups on sequences (such as lists). In both cases, the fundamental expression being evaluated is haystack[key].
The __getitem__ method takes two arguments: self and key. It should return the appropriate value if present, or raise an appropriate exception if absent. What exception is appropriate varies somewhat based on the situation, but is usually one of IndexError, KeyError, or TypeError.
The __setitem__ method is used in the same situation, except that it is invoked when a value is being set to the collection, rather than being looked up. It takes three positional arguments rather than two: self, key, and value.
It is not a requirement that every object that supports item lookup necessarily support item changes. In other words, it is entirely acceptable to define __getitem__ and not define __setitem__ if this is the behavior that you want.
Finally, the __delitem__ method is invoked in the unusual situation where key is deleted with the del keyword (for example, del haystack[key]).
The other major way that Python classes serve as collections is by being collections of attributes and objects. When a date object contains year, month, and day, those are attributes (which are set to integers in that case).
The __getattr__ method is invoked when attempting to get an attribute from an object, either with dot notation (such as obj.attr_name), or using the getattr method (such as getattr(obj, 'attr_name')).
However, unlike other magic methods, it is important to realize that __getattr__ is only invoked if the attribute is not found on the object in the usual places. In other words, the Python interpreter will first do a standard attribute lookup, return that if there is a match, and if there is not a match (in other words, AttributeError would be raised), then and only then is the __getattr__ method called.
In other respects, it works similarly to __getitem__ (discussed previously). It accepts two positional arguments (self and key), and is expected to return an appropriate value, or raise AttributeError.
Similarly, the __setattr__ method is the writing equivalent of __getattr__. It is invoked when attempting to write to an object, whether by dot notation or using the setattr method. Unlike __getattr__, it is always invoked (the method would be meaningless otherwise), and, therefore, should call the superclass method in situations where the traditional implementation is desired.
The reason why __getattr__ is only invoked if the attribute is not found is because this is ordinarily the desired behavior (otherwise, it would be very easy to fall into infinite recursion traps). However, the __getattribute__ method, unlike its more common counterpart, is called unconditionally.
The logical order here is that __getattribute__ is called first, and is ordinarily responsible for doing the traditional attribute lookup. If a class defines its own __getattribute__, it becomes responsible for calling the superclass implementation if it needs to do so. If (and only if) __getattribute__ raises AttributeError, __getattr__ is called.
A few other magic methods exist in addition to the ones described so far. In particular, Python implements an iterator protocol, which uses the __iter__ and __next__ methods. These are not discussed in detail here because they are discussed at length in Chapter 3, “Generators.”
Similarly, Python implements a rich language feature known as context managers, which make use of the __enter__ and __exit__ magic methods. These are also not discussed in detail here because they are discussed at length in Chapter 2, “Context Managers.”
The magic methods available to classes provide the Python language with a consistent data model that can be used across custom classes. This greatly enhances the readability of the language, in addition to providing hooks for classes of disparate types to interact with each other in predictable ways.
There is no reason to require that every custom class implement all of these methods, or even any of them. When writing a class, consider what functionality you need. However, if the functionality needed maps cleanly to an already defined method here, it is preferable to implement these rather than provide your own custom spelling.
In Chapter 5, you will learn about metaclasses.